-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update web backend get connection to properly handle field selection and schema refresh #20323
Conversation
…and schema refresh with newly-discovered fields
@@ -357,7 +360,7 @@ public WebBackendConnectionRead webBackendGetConnection(final WebBackendConnecti | |||
connection.setStatus(refreshedCatalog.get().getConnectionStatus()); | |||
} else if (catalogUsedToMakeConfiguredCatalog.isPresent()) { | |||
// reconstructs a full picture of the full schema at the time the catalog was configured. | |||
syncCatalog = updateSchemaWithDiscovery(configuredCatalog, catalogUsedToMakeConfiguredCatalog.get()); | |||
syncCatalog = updateSchemaWithDiscovery(configuredCatalog, catalogUsedToMakeConfiguredCatalog.get(), catalogUsedToMakeConfiguredCatalog.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we just passing the same catalogUsedToMakeConfiguredCatalog.get()
in twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hm, I think I see why but it's confusing, maybe in the case where there isn't a refreshed catalog, we should call a different, simpler method that takes advantage of the fact that the original discovered catalog is equal to the discovered catalog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can see how this is confusing.
It wouldn't actually simplify much to have two different methods - essentially it's mostly saying "reconstruct it by looking at what we persisted versus the latest catalog". This PR adds a third element: (1) what we persisted; (2) the whole catalog when we persisted it; (3) the latest catalog. It's just that in the case where the catalog hasn't been rediscovered, (2) and (3) are actually the same. But otherwise the logic stays the same.
I do agree that it stands out as weird to pass the same thing twice like that, so I added a helper method in the hopes of better readability.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Makes sense, the helper makes it much clearer
originalDiscoveredStream.getStream().getJsonSchema().findPath("properties").fieldNames() | ||
.forEachRemaining((name) -> originallyDiscovered.add(name)); | ||
stream.getJsonSchema().findPath("properties").fieldNames().forEachRemaining((name) -> refreshDiscovered.add(name)); | ||
// We include a selected field if it: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great comments, would be hard to follow without them!
final WebBackendConnectionRead result = testWebBackendGetConnection(true, connectionRead, | ||
operationReadList); | ||
|
||
// We expect the discovered catalog with two fields selected: the one that was originally selected, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also really appreciate the comments describing how the test works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Really helpful comments that help navigate the complexity. Also test cases look solid, great stuff!
This change broke the "Refresh Schema" button for a lot of connections on my install.
This is because of this comment, some lines below the change, the source_catalog_id is empty for those connections: airbyte/airbyte-server/src/main/java/io/airbyte/server/handlers/WebBackendConnectionsHandler.java Lines 351 to 358 in 8252783
There should be a check to see if airbyte/airbyte-server/src/main/java/io/airbyte/server/handlers/WebBackendConnectionsHandler.java Line 343 in 8252783
|
What
Make the web backend handler properly handle field selection when there is a schema refresh.
How
Only include a selected field if:
is in the newly discovered schema
ANDit was either originally selected OR not in the originally discovered schema at all
.TODO before merging: add another unit test for the case where a field is removed in the newly-discovered schema.
Recommended reading order
Reading the new unit test might make the intended logic clearer, and therefore easier to review the actual code change.